Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Web spam detection based on immune clonal feature selection and under-sampling ensemble
LU Xiaoyong, CHEN Musheng, WU Jhenglong, CHANG Peichan
Journal of Computer Applications    2016, 36 (7): 1899-1903.   DOI: 10.11772/j.issn.1001-9081.2016.07.1899
Abstract541)      PDF (808KB)(282)       Save
To solve the problem of "curse of dimensionality" and imbalance classification, a binary classifier algorithm based on immune clonal feature selection and Under-Sampling (US) ensemble was proposed to detect Web spam. Firstly, major samples in training dataset were sampled into several sample subsets, which were combined with minor samples to generate several balanced training sample subsets. Then an immune clonal algorithm was proposed to select several optimal feature subsets. The balanced training subsets were projected to multiple views based on the optimal feature subsets. Finally, several Random Forest (RF) classifiers were trained by these views of the training sample subsets to classify the testing samples. The testing samples' classifications were determined by voting. The experimental results on the WEBSPAM UK-2006 dataset show that the ensemble classifier algorithm outperforms these algorithms like RF, Bagging with RF and AdaBoost with RF, and its accuracy, F1-Measure, AUC (Area Under ROC Curve) are increased by more than 11% respectively. Compared with several state-of-the-art baseline classification models, the F1-Measure is increased by 2% and the AUC reaches the optimum result using the ensemble classifier.
Reference | Related Articles | Metrics